Keyword Extraction using Clustering and Semantic Analysis

نویسندگان

  • Mohamed H. Haggag
  • Ahmed Basil
چکیده

Keywords are list of significant words or terms that best present the document context in brief and relate to the textual context. Extraction models are categorized into either statistical, linguistic, machine learning or a combination of these approaches. This paper introduces a model for extracting keywords by making words pairs and clustering these pairs based on the Semantic similarity that will be provided by using lesk algorithm and (WordNet), a lexical database for the English language. The model also used a statistical method to ensure clusters cohesion and provide more reliable result, because the final keywords will be selected from these clusters. This paper also show three other basic approaches to extract keywords, these approaches will be used to measure the efficient of the main approach. The proposed model showed enhanced over the three other approaches in both precision and recall.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keyword Extraction for Webpage Clusters

The volume of unstructured information presented on the Internet is constantly increasing, together with the total amount of websites and their contents. To process this vast amount of information it is important to distinguish different clusters of related webpages. Such clusters are used, for example, for template induction, keyword extraction, and recommendation algorithms. A variety of appl...

متن کامل

Experiments in Clustering Documents for Automatic Acquisition of Lexical Semantic Networks for Polish

The aim of this work is to explore document clustering techniques for the needs of semi–automatic construction of a lexical semantic network for Polish. Although the majority of research in this area is based on measures of distributional similarity calculated from co-occurrences of words in large collections of documents, we wanted to approach a difficult problem of meaning ambiguity resolutio...

متن کامل

Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks

Keyword and keyphrase extraction is an important problem in natural language processing, with applications ranging from summarization to semantic search to document clustering. Graph-based approaches to keyword and keyphrase extraction avoid the problem of acquiring a large in-domain training corpus by applying variants of PageRank algorithm on a network of words. Although graph-based approache...

متن کامل

Semantic Correspondence of Database Schema from Heterogeneous Databases using Self-Organizing Map

This paper provides a framework for semantic correspondence of heterogeneous databases using selforganizing map. It solves the problem of overlapping between different databases due to their different schemas. Clustering technique using self-organizing maps (SOM) is tested and evaluated to assess its performance when using different kinds of data. Preprocessing of database is performed prior to...

متن کامل

Analysis of Statistical Keyword Extraction Methods for Incremental Clustering

Incremental clustering is a very useful approach to organize dynamic text collections. Due to the time/space restrictions for incremental clustering, the textual documents must be preprocessed to maintain only their most important information. Statistical keyword extraction methods from single documents are useful in this scenario. However, different statistical methods have different assumptio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014